An Adaptive Data Structure for Nearest Neighbors Search in an General Metric Space
نویسندگان
چکیده
We consider the problem of computing nearest neighbors in an arbitrary metric space, particularly a metric space that cannot be easily embedded in R. We present a data structure, the Partition Tree, that can be constructed in O(n logn) time, where n is the size of the set of points to be searched, and has been experimentally shown to have an average query time that is a sublinear function of n (roughly O(log(n)) where 4 ≤ α ≤ 5). Our experiments show that this data structure could have applications in bioinformatics, particularly protein secondary structure prediction, where it can be used for similarity search among short sequences of proteins’ primary structure.
منابع مشابه
Anisotropic k-Nearest Neighbor Search Using Covariance Quadtree
We present a variant of the hyper-quadtree that divides a multidimensional space according to the hyperplanes associated to the principal components of the data in each hyperquadrant. Each of the 2 hyper-quadrants is a data partition in a λ-dimension subspace, whose intrinsic dimensionality λ ≤ d is reduced from the root dimensionality d by the principal components analysis, which discards the ...
متن کاملUsing the k-Nearest Neighbor Graph for Proximity Searching in Metric Spaces
Proximity searching consists in retrieving from a database, objects that are close to a query. For this type of searching problem, the most general model is the metric space, where proximity is defined in terms of a distance function. A solution for this problem consists in building an offline index to quickly satisfy online queries. The ultimate goal is to use as few distance computations as p...
متن کاملA Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...
متن کاملWinner-Update Algorithm for Nearest Neighbor Search
This paper presents an algorithm, called the winnerupdate algorithm, for accelerating the nearest neighbor search. By constructing a hierarchical structure for each feature point in the lp metric space, this algorithm can save a large amount of computation at the expense of moderate preprocessing and twice the memory storage. Given a query point, the cost for computing the distances from this p...
متن کاملMLR-Index: An Index Structure for Fast and Scalable Similarity Search in High Dimensions
High-dimensional indexing has been very popularly used for performing similarity search over various data types such as multimedia (audio/image/video) databases, document collections, time-series data, sensor data and scientific databases. Because of the curse of dimensionality, it is already known that well-known data structures like kd-tree, R-tree, and M-tree suffer in their performance over...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010